DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics
نویسندگان
چکیده
There are many instances in genomics data analyses where measurements are made on a multivariate response. For example, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where differences (e.g. between normal and disease state) in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to prognostic capabilities. Similarly, knowledge of single nucleotide polymorphisms (SNPs) that affect splicing, so-called splicing quantitative trait loci (sQTL) will help to characterize the effects of genetic variation on gene expression. RNA sequencing (RNA-seq) has provided an attractive toolbox to carefully unravel alternative splicing outcomes and recently, fast and accurate methods for transcript quantification have become available. We propose a statistical framework based on the Dirichlet-multinomial distribution that can discover changes in isoform usage between conditions and SNPs that affect relative expression of transcripts using these quantifications. The Dirichlet-multinomial model naturally accounts for the differential gene expression without losing information about overall gene abundance and by joint modeling of isoform expression, it has the capability to account for their correlated nature. The main challenge in this approach is to get robust estimates of model parameters with limited numbers of replicates. We approach this by sharing information and show that our method improves on existing approaches in terms of standard statistical performance metrics. The framework is applicable to other multivariate scenarios, such as Poly-A-seq or where beta-binomial models have been applied (e.g., differential DNA methylation). Our method is available as a Bioconductor R package called DRIMSeq.
منابع مشابه
DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics [version 2; referees: 2 approved]
There are many instances in genomics data analyses where measurements are made on a multivariate response. For example, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where differences (e.g. between normal and disease state) in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to pro...
متن کاملDRIMSeq: Dirichlet-multinomial framework for differential transcript usage and transcript usage QTL analyses in RNA-seq
متن کامل
DRIMSeq: Dirichlet-multinomial framework for differential splicing and sQTL analyses in RNA-seq
3 Differential splicing analysis work-flow 3 3.1 Example data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2 Differential splicing analysis with DRIMSeq package . . . . . . . . . . . . . . . . . . . . . . . 3 3.2.1 Loading pasilla data into R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2.2 Filtering . . . . . . . . . . . . . ....
متن کاملA New Utility-Consistent Econometric Approach to Multivariate Count Data Modeling
In the current paper, we propose a new utility-consistent modeling framework to explicitly link a count data model with an event type multinomial choice model. The proposed framework uses a multinomial probit kernel for the event type choice model and introduces unobserved heterogeneity in both the count and discrete choice components. Additionally, this paper establishes important new results ...
متن کاملDirichlet negative multinomial regression for overdispersed correlated count data
A generic random effects formulation for the Dirichlet negative multinomial distribution is developed together with a convenient regression parameterization. A simulation study indicates that, even when somewhat misspecified, regression models based on the Dirichlet negative multinomial distribution have smaller median absolute error than generalized estimating equations, with a particularly pr...
متن کامل